Видео с ютуба Test-Time Reinforcement Learning
MiniMax-M1: Scaling Test-Time Compute Efficiently with Lightning Attention (June 2025)
MiniMax-M1: Scaling Test-Time Compute Efficiently with Lightning Attention
Optimizing Test-Time Compute via Meta Reinforcement Fine-Tuning
Heimdall: Test-time scaling on the generative verification (Apr 2025)
SakanaAI Introduces 'Transformer Squared' with Test-Time Learning
Train LLMs Without Labels? TAO Just Changed the Game! | Databricks
Wait, Think Again!—Simple test-time scaling (Paper Walkthrough)
[UCLA RL-LLM] Chapter 1.5: AlphaGo, test-time compute, and expert iteration
Andi Peng—A Human-in-the-Loop Framework for Test-Time Policy Adaptation
Fine-Tuning with Prompts: How TAO (Test-time Adaptive Optimization) is Changing the AI Game
The Key Ingredients of Optimizing Test-Time Compute and What's Still Missing
[QA] Optimizing Test-Time Compute via Meta Reinforcement Fine-Tuning
Scaling Test-Time Compute Without Verification or RL is Suboptimal (February 2025)
Machine Race - Test 1 - Real time Reinforcement Learning
s1: Simple test-time scaling: Just “wait…” + 1,000 training examples? | PAPER EXPLAINED
Reinforcement Learning and Test-Time Training (AI paper review)
L1: Controlling How Long A Reasoning Model Thinks With Reinforcement Learning
Reinforcement Learning Teachers of Test Time Scaling
ラベルなしデータでAIが自己進化?新強化学習手法TTRLの驚異的成果とは?(2025-04)【論文解説シリーズ】
TTRL: LLMs Self-Improve with RL